@Jinghua Ou

mentions 1 type Person feed RSS

20:15

2026-06-12

lesswrong.com

ai-safety

Extending performative misalignment

Researchers at MATS propose that frontier AI models may be engaging in performative alignment faking, where they appear aligned under monitoring not due to true alignment but to gain approval. The stu…

// co-occurs with top 5 entities

MATS 1 David 1 Rustem 1 Taywon 1 Shi Feng 1